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Abstract.  We  estimate  via  simulation  the  expectation  of  certain  integrals  of 
functionals  of  continuous-time  Markov  chains  over  a  finite  horizon,  fixed  or  ran¬ 
dom.  By  computing  conditional  expectations  given  the  sequence  of  states  visited 
(and  possibly  other  information),  we  reduce  variance.  This  is  discrete-time  con¬ 
version.  We  further  increase  efficiency  by  combining  discrete-time  conversion 
with  stratification  and  splitting. 

Key  words,  simulation,  finite-horizon,  continuous-time  Markov  chains,  vari¬ 
ance  reduction. 
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1  Introduction 


Estimating  expected  cumulative  “reward”,  possibly  continuously  discounted, 
up  to  a  finite  horizon  r  has  practical  interest.  This  and  other  examples  in  our 
paper  are  special  cases  of  the  following;  estimating  expectations  of  integrals  of 
functionals  /  of  a  continuous-time  Markov  chain  A'  with  respect  to  a  weight 
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function  G.  Here  f(s)  is  the  “reward”  rate  when  in  state  s.  We  deal  with  two 
general  classes  of  such  integrals.  Both  have  the  form 

r  f{x(i))G{dt) 

Jo 

and  differ  only  in  the  definition  of  r.  By  selecting  /  and  G  appropriately,  many 
standard  problems  become  special  cases  as  Sections  3  and  -5  detail.  We  estimate 
the  expectations  of  these  integrals  via  simulation.  This  is  carried  out  efficiently 
via  discrete-time  conversion:  computing  conditional  expectations  given  the  se¬ 
quence  of  states  visited  (and  possibly  other  information).  Section  2  shows  that 
the  less  we  condition  on,  the  more  variance  is  reduced.  However,  the  work  W 
to  compute  conditional  expectations  depends  on  what  we  condition  on.  Section 
2  shows  that  we  want  to  minimize  even  when  the  conditional  expec¬ 

tation  and  the  work  to  compute  it  are  correlated.  Especially  in  sections  4  and 
6,  we  discuss  implementation  and  estimate  the  order  of  magnitude  of  the  work 
involved. 

Section  3  provides  theory  for  random-time  horizons,  where  r  is  the  hitting 
time  of  a  specified  subset  of  the  state  space.  We  relate  this  to  regenerative 
steady-state  simulations.  Section  5  provides  theory  for  fixed-time  horizons  r. 
Proofs  are  deferred  to  the  appendix. 

The  estimators  in  sections  3  and  5  are  springboards  to  significant  improve¬ 
ments  in  sections  4  and  6  respectively.  New  ideas  (but  not  new  theory)  are 
introduced  in  sections  4  and  6;  these  ideas  make  our  theoretical  results  more 
important  and  more  practical. 

2  Preliminaries 

As  indicated  in  the  introduction,  much  of  this  paper  develops  variance  reduction 
techniques  for  continuous-time  Markov  chains  based  on  appropriately  “condi¬ 
tioning  out”  holding  times.  Such  methods  are  but  special  cases  of  the  general 
variance  reduction  technique  known  as  conditional  Monte  Carlo. 

To  set  the  stage  for  a  discussion  of  conditional  Monte  Carlo,  we  first  math¬ 
ematically  characterize  the  efficiency  of  an  estimator.  Suppose  that  we  wish 
to  estimate  a  parameter  q  that  can  be  expressed  as  a  =  ER  for  some  r.v.  R. 
The  parameter  o  can  be  calculated  by  generating  iid  copies  R\,R2,...  of  R 
and  forming  their  sample  mean.  Let  0{Ri)  be  the  amount  of  computer  time 
required  to  generate  /?,.  We  assume  (reasonably)  that  the  pairs  (/Zj,  0{Ri))  are 
iid,  though  Ri  and  0(Ri)  may  be  correlated. 

Given  a  computer  budget  t,  the  number  of  observations  completed  within 
the  budget  is 

n 

N(t)  =  max{n  >  0 

1=1 
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The  sample  mean  r{t)  formed  with  the  above  budget  constraint  is 


r(0  =  /  why  5  ^  ^ 

\0  ;Ar(t)  =  0. 

The  (asymptotic)  efficiency  of  the  estimator  r(<)  is  determined  by  how 
quickly  r(t)  converges  to  a  as  the  budget  t  goes  to  infinity.  This  rate  of  conver¬ 
gence  is  characterized  by  the  central  limit  theorem  for  r(<). 

THEOREM  1.  Suppose  that  0  <  E[0{Ri)]  <  oo  and  that  o^{Ri)  =  var  J?,  < 
oo.  Then, 

as  t  — »  oo. 


This  theorem  is  elementary  only  when  P{Ri)  is  deterministic. 

Theorem  1  suggests  defining  the  asymptotic  efficiency  of  the  estimator  r(/) 
as  the  reciprocal  of  E[0{R\)\<t^{Ri).,  Hammersley  and  Handscomb  [(1964),  p. 
51]  suggest  the  same  figure  of  merit  without  providing  theoretical  justifica¬ 
tion.  Thus,  the  efficiency  of  a  simulation  algorithm  increases  when  the  product 
E[0{R\)]<t’^{R\)  decreases.  This  permits  obtaining  an  improved  estimator  in 
which  either  E[0{Ri)]  or  <t^(/2i)  is  increased,  so  long  as  the  product  decreases. 

We  now  apply  these  ideas  to  our  discussion  of  conditional  Monte  Carlo. 
Suppose  that  R  is  an  /’-measurable  r.v.  and  let  Q  and  7i  be  sub-rr-fields  of  /" 
such  that  gen.  Set  Rg  =  E[R\g],  Rn  =  E[R\n].  If  E\R\  <  oo,  it  is  well 
known  that  a  =  ER  =  ERg  —  ER-h-  Hence,  competing  estimators  for  q,  based 
on  averaging  replicates  of  Rg  and  Rn,  can  be  considered;  such  estimators  are 
known  as  conditional  Monte  Carlo  estimators.  Let  rp(<),  r7<(t)  be  the  estimators 
formed  from  sample  means  of  iid  copies  of  the  r.v.’s  Rq  and  Rn,  respectively. 
From  Theorem  2.1,  we  find  that  the  efficiencies  of  the  estimators 
and  rT<(<)  are  the  reciprocals  of  the  products  E[0{R)](t'^ {R) ,  E[0{Rc)]<t'^ {Rg) , 
and  E{0{Rn)\<f^{Rn),  respectively.  The  quantities  E[0{R)],  E[0{Rg)],  and 
E![0{R‘h)\  measure  the  mean  computation  time  to  generate  the  three  types  of 
observations.  These  quantities  are  hard  to  quantify  precisely,  since  the  com¬ 
puter  time  to  generate  an  observation  is  implementation  dependent  (although 
their  respective  orders  of  magnitude  sometimes  can  be  estimated).  On  the  other 
hand,  we  can  order  a  priori  the  variances  (T^{R),<T^(Rg),  7^{Rn),  as  the 

following  well-known  proposition  shows  (see  Problem  6  oi  •'■ae-;  305  of  Chung 
(1974)  for  an  equivalent  statement). 

PROPOSITION  1.  If  var  R  <  oo,  then  var  Rg  <  var  Rn  <  var  R. 
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Proposition  1  states  that  conditioning  on  less  reduces  variance  (by  “integrat¬ 
ing  out”  more  randomness).  An  extreme  case  is  ^  =  {<&,  fl);  here  =  ER  has 
zero  variance,  but  can’t  be  found  exactly. 

This  last  example  illustrates  the  compromises  in  choosing  the  appropriate 
conditioning  variables  upon  which  to  apply  the  method  of  conditional  Monte 
Carlo.  The  choice  is  a  tradeoff  between  the  amount  of  variance  reduction  ob¬ 
tained  (as  measured  by  <t^(R^))  and  the  implementation  difficulty  inherent  in 
computing  the  conditional  expectation  Rq  (as  measured  by  ERq).  This  need 
for  compromise  is  a  focus  of  our  discussion  in  subsequent  sections. 


3  Discrete-time  conversion  for  random-time  hori¬ 
zons:  theory 

Let  X  =  (X(<)  :  <  >  0)  be  a  non-explosive  continuous-time  Markov  chain  living 
on  state  space  S  and  let  /  be  a  real-valued  function  defined  on  5.  For  B  £  S, 
let  T{B)  =  inf{t  >  0  :  A'(<)  €  B,X{t—)  ^  B)  be  the  first  “hitting  time” 
of  the  subset  B.  We  further  let  G  :  [0,oo)  — ♦  [0,oo)  be  a  right-continuous 
(deterministic)  non-decreasing  function,  which  then  acts  as  an  “integrator”. 

In  this  section  we  apply  the  method  of  conditional  Monte  Carlo  to  compute 
Q  =  E[I],  where 

^  =  /  /(X(t))G(dt).  (1) 

Assuming  that  /(i)  is  interpreted  as  the  rate  at  which  “cost”  is  incurred  while 
the  process  occupies  state  x,  the  above  estimation  problem  arises  in  several 
different  settings. 

SETTING  1.  If  G(<)  =  t,  then  I  corresponds  to  the  total  cost  accumulated 
by  the  process  X  up  to  time  T{B). 

SETTING  2.  If  G(i)  =  r“^(l  — e'”*)  where  r  >  0,  then  /  corresponds  to  the 
r-discounted  cost  accumulated  by  X  up  to  time  T{B). 

SETTING  3.  Given  T  >  0,  suppose  that  the  system  is  charged  a  cost  which 
depends  on  the  state  occupied  at  time  T.  We  assume  that  if  the  process  hits 
B  before  T,  no  cost  is  charged  to  the  system.  If  /(x)  is  the  cost  incurred  when 
the  system  occupies  state  x  at  time  T,  then  the  expected  cost  a  —  where 
I  takes  the  form  (1)  and  G(f)  =  /(t  >  T).  If  /  =  1,  then  a  =  P{T{B)  >  T),  so 
that  o  is  the  right  tail  of  the  cumulative  distribution  function  of  T{B). 

Although  it  may  appear  that  the  estimation  problem  considered  here  per¬ 
tains  only  to  finite-horizon  problems,  it  turns  out  to  be  also  relevant  to  the 
infinite-horizon  steady-state.  If  X  is  irreducible  and  positive  recurrent  on  state 
space  S,  the  process  X  is  regenerative  with  respect  to  consecutive  hitting  times 
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of  any  state  z  £  S.  Assuming  suitable  moment  hypotheses  are  in  force,  regen¬ 
erative  process  theory  (Qlinlar  (1975),  §9.2)  shows  that  the  steady-state  rate  q 
at  which  cost  accrues  is  given  by  the  well-known  ratio  formula 

E,T'(z) 

where  T'(z)  =  T({z})  and  £,(•)  is  the  expectation  operator  conditional  on 
X(0)  =  2.  Both  the  numerator  and  denominator  have  the  form  (1),  so  variance 
reduction  techniques  developed  for  (1)  are  applicable  to  regenerative  steady- 
state  simulation  of  X-,  e.g.,  see  Fox  and  Glynn  (1986).  Via  Little’s  law  (e.g.,  see 
Glynn  and  Whitt  (1989),  this  lets  us  estimate  expected  customer  sojourn  time 
in  system  in  steady-state  efficiently. 

To  apply  conditional  Monte  Carlo  to  (1),  we  condition  on  the  embedded 
chain  Z  =  (Zq,  Zi, . . .),  where  Zi  ^  Zi-i  and  Z,-  is  the  state  visited  by  X 
just  after  jump  i.  Let  r,  be  the  time  between  jumps  i  and  i  -b  1,  so  that 

5(n)  =  To  -f - Tn  is  the  instant  at  which  X  makes  its  (n  -t-  l)-st  jump.  Put 

5(  — 1)  =  0  and  let  //(B)  =  inf{n  >  1  :  Z„  €  B).  Note  that 

'  =  E 

n=0 

=  XI  /(^n)[G(5(n)-)-G(5(n-l)-)].  (2) 

n=0 

Assuming  that  £  t(b))  1/(-^(0)I^(<^0  <  conditional  expec¬ 

tations  of  both  sides  of  (2)  with  respect  to  Z,  yielding 

H{B)-1 

E[I\Z]=  f(Z„){E[G(S(n)-)lZ]-E[GiSin-l)-)\Z}.  (3) 

n=0 

Let  Q  =  (Q(x,y)  :  x,y  £  S)  he  the  generator  of  X  and  let  A(x)  =  —Q{x,x). 
Continuous-time  Markov  chains  have  the  convenient  property  that,  conditional 
on  Z,  the  holding  times  ro,  n, . . .  are  conditionally  independent  with  conditional 
distributions  P{r,  £  dt\Z}  =  fz,(^)dt,  where  /r(»)  is  an  exponential  density 
with  parameter  A(x).  Hence, 

E[G(S(n)-)|Z]  =  r  G(<)(/z„  *  •  •  •  *  fz.)(t)dt  (4) 

Jo 

for  n  >  0,  where  ♦  denotes  convolution.  We  don’t  need  G(t~)  in  the  integrand 
because  G(i)  =  G(t~)  for  all  but  at  most  countably  many  t.  Combining  (3) 
and  (4)  yields  an  expression  for  Iz  —  E\I\Z],  from  which  a  conditional  Monte 
Carlo  estimator  for  o  can  be  obtained.  We  say  that  the  resulting  estimator 


L 


[5(n-.l),S(n)) 


fiX(t))G(dt) 
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is  obtained  by  “discrete-time  conversion”  since  the  new  estimator  depends  on 
X  only  through  a  discrete-time  chain,  so  that  the  holding  times  need  not  be 
simulated. 

As  discussed  in  Section  2,  the  efficiency  of  the  estimator  based  on  Iz  is 
determined  by  var  Iz  and  by  EP{Iz)-  Depending  on  the  problem,  the  variance 
difference  var  I  —  var  Iz  can  take  on  any  value  between  zero  and  infinity,  as  the 
following  example  illustrates. 

EXAMPLE  1.  Let  X  be  a  pure  birth  process  on  the  non-negative  integers 
with  constant  birth  rate  equal  to  A.  Suppose  X(0)  =  0.  If  G(i)  =  i,  then 
7  =  T'(n)  and  E[T'(n)\Z]  =  n/A  =  ET'{n),  so  var  Iz  =  0.  On  the  other  hand, 
T'(n)  is  an  Erlang-n  r.v.  with  scale  parameter  A,  so  var  /  =  n/A^.  Hence,  the 
variance  reduction  can  be  made  either  arbitrarily  large  or  small,  depending  on 
how  one  chooses  n  and  A. 


Since  we  expect  that  the  variance  reduction  will  be  (at  least)  moderate  in 
most  practical  examples,  the  estimator  batsed  on  Iz  is  more  efficient  provided 
that  the  time  required  to  compute  the  conditional  expectation  is  at  most  mod¬ 
erately  more  than  that  to  compute  7.  Note  that  Iz  requires  simulation  only 
of  Z]  holding  times  need  not  be  generated.  Unfortunately,  the  convolution  in 
(4)  can  be  expensive  to  compute.  However,  for  two  important  choices  of  G,  the 
convolution  (4)  is  relatively  cheap  to  calculate.  For  such  G’s,  the  discussion  of 
Section  2  shows  that  the  discrete-time  estimator  Iz  is  a  clear  winner  over  7. 

SETTENG  1  (continued).  If  (7(t)  =  t,  then  (4)  is  just  the  expected  value  of 
the  sum  ofn-(-l  independent  exponential  r.v.’s  with  parameters  \{Zo),  •  •  ■ ,  A(Z„). 

So,  we  obtain  .£[G(5(n)— )|Z]  l/X{Zt)  and 

EII\Z]=  Y.  fiZn)/\{Zn). 

n=0 

This  estimator  was  first  studied  in  a  regenerative  steady-state  context  by  Hordijk, 
Iglehart,  and  Schassberger  (1976),  although  it  weis  not  analyzed  using  the  prin¬ 
ciple  of  conditional  Monte  Carlo.  This  estimator  was  further  studied  by  Fox  and 
Glynn  (1986)  in  a  steady-state  context,  where  the  ties  to  regenerative  simulation 
were  cut. 

SETTEVG  2  (continued).  Suppose G(<)  =  r-i(l-e-'‘‘).  Note  that  e-’-‘(/z„* 
•  •  ■  ♦  is  the  Laplace  transform  (evaluated  at  r)  of  the  distribution  of 

the  sum  of  n  independent  exponential  r.v.’s.  Hence, 
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and 


E[I\Z]  = 


k=0  ^ 


(  HZk) 


n  =  0 


A(Zi)  + 


1 

(A(2„)  +  r)- 


An  estimator  similar  to  the  above  was  first  described  by  Fox  and  Glynn  (1989a) 
in  an  infinite-horizon  setting. 


For  general  G,  the  integral  (4)  is  typically  (much)  more  computationally 
expensive  to  calculate,  diminishing  the  attractiveness  of  Iz  as  an  estimator  of 
Q.  This  point  is  illustrated  by  our  next  example. 

SETTING  3  (continued).  For  G{t)  =  I{t  >  T),  we  find  that  (4)  involves 
calculating  the  tail  of  the  distribution  function  (evaluated  at  T)  of  the  sum  of  n-f 
1  independent  exponential  r.v.’s.  Since  the  parameters  of  the  n-(- 1  exponentials 
typically  differ  from  one  another,  this  distribution  function  is  neither  available 
in  closed  form  nor  easy  to  calculate  numerically. 


Because  of  the  above  computational  difficulty  in  calculating  (4),  a  different 
approach  to  discrete-time  conversion  may  be  preferable.  Suppose  that  X  is  now 
a  uniformizable  continuous-time  Markov  chain  and  let  A  =  sup{A(i:)  :  x  6  S}. 
For  6  >  \,  X  can  be  represented  as  A'(t)  =  where  y*  =  (Vq*,  . .)  is 

a  discrete-time  Markov  chain  having  transition  matrix  P{9)  =  0~^{Q  +  91)  and 
Ne{*)  is  a  Poisson  process,  independent  of  Y* ,  having  rate  9.  We  now  develop 
a  conditional  Monte  Carlo  estimator  for  a  based  on  the  conditioning  variables 
y*  =  (yg*,  y/, . . .).  simulating  the  naive  estimator  I  using  the  uniformized 
chain  would  waste  work. 

Let  r/g,  r/i ,  • . .  be  the  iid  exponential  (9)  interevent  times  of  the  Poisson  pro¬ 
cess  N)  and  let  zz  tjq  +  + - h  be  the  instant  at  which  Ng  makes  its 

(n  -t-  l)-st  jump.  Put  Je{B)  =  inf{n  >  1 : ^  B,Y^  G  B]  and  observe  that 

MB)-1 

n  =  0 

Since  the  T^'s  are  independent  of  y®,  it  follows  that 

/.(B)- I 

E[I\Y^]=  E  ny^)[EGiTt-)  -  EG{Ji_,-)].  (5) 

n=0 

Noting  that  TjJ  is  an  Erlang  r.  v.  with  shape  parameter  n  +  1  and  scale  parameter 
9,  we  find  that 

roo  an+\fn.-et 

EG{Tt-}=  /  G(0 - j - dt.  (6) 

Jo  ”■ 
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Combining  (5)  and  (6)  yields  an  expression  for  7y»  =  E[/|y*]. 

The  principal  advantage  of  ly  over  I2  is  that  the  computation  of  (6)  is 
typically  (much)  cheaper  than  that  of  (4).  For  example,  in  (4),  the  convolution 
fzo  ♦  •  ■  •  ♦  /z.  must  be  evaluated  numerically  at  many  points  before  integrating 
numerically  against  G{t)dt,  whereas  in  (6)  the  convolution  is  calculated  analyt¬ 
ically.  In  addition,  for  certain  (practically  important)  choices  of  G,  (6)  can  be 
calculated  in  closed  form  when  (4)  cannot. 


SETTING  3  (continued).  For  G(i)  =  I(i  >  T),  we  have  that  (6)  equals 
P{7^  >T}  =  P{Nt(T}  <  n}.  Hence, 


/>'*=  E 

n=0 


fT 


(0T)’' 


n! 


We  now  choose  0  optimally.  Recall  that  Y*  can  be  obtained  from  Z  by 
adding  “null  jumps”.  More  precisely,  V*  can  be  represented  as 


i=o  >=o 


(7) 


i=0 


where  j/o,j/f,...  are  conditionally  independent,  given  Z,  and  P{i'^  =  t\Z]  — 


Since  Je{B)  =  stochastically  increasing  in 

$,  clearly  Jg{B)  is  stochastically  increasing  in  6.  Thus,  the  work  required  to 
compute  (5)  is  minimized  by  taking  6  as  small  as  possible,  namely  9  =  \. 

As  for  the  variance  of  /y»,  Y*  differs  from  Y*  only  in  that  it  has  more 
null  jumps.  In  some  sense,  Y*  contains  more  information  than  Y'^  and  hence 
Proposition  1  ought  to  apply,  yielding  the  conclusion  that  var  /y*  is  minimized 
by  taking  6  =  X.  Our  next  theorem  confirms  this  assertion. 


THEOREM  2.  Suppose  that  var  1  <  00.  Then,  var  lyx  <  var  /y»  for  all 
e>  X. 


Hence,  the  efficiency  of  the  estimator  based  on  replicating  /y»  is  maximized 
by  taking  0  =  X. 

This  leaves  us  with  the  question  of  when  lyx  is  more  efficient  than  Iz.  First, 
note  that  J\{B)  >  H{B)  so  that  the  number  of  summands  in  (3)  is  always 
less  than  the  number  in  (5).  Hence,  the  estimator  based  on  lyx  requires  less 
work  than  does  Iz  only  when  the  integral  (4)  is  significantly  more  expensive  to 
compute  than  (6).  For  the  variance  comparison,  it  is  clear  that  the  sequence 
Z  is  a  (deterministic)  function  of  Y*,  so  that  cr(Z)  <  <t(Y^).  Proposition  1 
immediately  yields  the  next  result. 
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THEOREM  3.  Suppose  that  var  I  <  oo.  Then,  var  Iz  <  var  lyi . 

With  this  result  in  hand,  it  is  clear  that  Iz  is  the  discrete-time  estimator  of 
choice  whenever  the  computation  of  (4)  is  not  too  much  more  expensive  than 
that  of  (6).  If  this  condition  is  not  satisfied,  the  choice  is  fuzzier;  we  defer 
discussion  to  the  next  section  on  implementation  issues. 

4  Discrete-time  conversion  for  random-time  hori¬ 
zons:  implementation 

In  this  section,  we  first  consider,  in  subsection  4.1,  how  to  efficiently  calculate 
(4)  [a  key  to  computing  E[I\Z]  in  formula  (3)]  for  general  G.  Next,  in  subsection 
4.2,  we  consider  how  to  implement  formulas  (5)  and  (6)  which  were  based  on 
conditioning  on  V*.  For  finite  horizons,  whether  or  not  it  pays  to  uniformize 
depends  on  the  problem  (via  Theorem  1).  This  carries  over  to  infinite  horizons. 
Subsection  4.3  considers  steady-state  sojourn  time  S.  To  estimate  the  expecta¬ 
tion  of  a  linear  function  of  S,  we  don’t  uniformize.  But  for  a  nonlinear  function, 
we  do. 

Fox  (1989)  and  Fox  and  Young  (1989)  show  how  to  generate  Z  (and  hence 
y*)  quickly.  The  techniques  developed  there  reduce  Ep{Iz)  and  EP{Iy»),  in¬ 
creasing  efficiency. 

4.1  Without  uniformization 

It  can  be  easily  verified  that  when  X(Zo), . . . ,  X(Z„)  are  distinct,  the  convolutions 
in  (4)  take  the  form 


co,„  exp(-A(2o)t)  + - 1-  c„,„  exp(-A(Z„)<).  (8) 

Furthermore,  the  coefficients  co.n.  •  •  • .  Cn,n  can  be  efficiently  calculated  from 
co,n-!,  •  ■  -  iCn-i  n-i  (the  coefficients  associated  with  fz^  *  ■  ■  ■  *  fz,-t)  in  order 
n  operations.  With  the  initial  condition  cqo  =  A(Zo),  we  recursively  solve  for 
the  coefficients  Co.n,  •  •  • ,  Cn.n  appearing  in  (7)  in  order  operations.  When 
the  A(Zj)’s  are  not  distinct,  some  terms  in  (7)  get  multiplied  by  (routinely- 
determined)  powers  of  t  and  other  terms  vanish.  Again,  in  this  ceise,  the  coeffi¬ 
cients  can  be  calculated  recursively. 

To  numerically  evaluate  (4),  we  use  the  recursi'  :  algorithm  described  above 
to  find  the  symbolic  representation  of  the  convolution  and  then  numerically 
evaluate  the  product  of  the  convolution  with  G(t)  on  a  suitably  defined  grid  of 
points. 

In  some  cases,  it  may  pay  to  look  for  (and  exploit)  common  subsequences  of 
A(Zi)’s  across  runs  to  avoid  computing  all  convolutions  for  all  runs  from  scratch. 
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4.2  With  uniformization 

Relation  (5)  can  be  written  in  the  form 


E[I\Y»]=  53  /(2O[fe(xt,0)-fc(xLi.^)]  (9) 

fc=o 

where 

xi 

>=o 

and 

b(n,6)=  /  - —dt. 

Jo 

We  check  whether  b{n,6)  has  already  been  computed  (and  stored)  from  some 
previous  run.  If  it  has,  we  use  it.  If  it  hasn’t,  let  h  be  the  maximum  j  for 
which  b{j,6)  has  been  previously  computed.  We  can  then  recursively  calculate 
b(n  +1,6)  through  b{n,6)  from  b{h,0),  by  using  the  following  ideas: 

i)  on  a  grid  tj  <  ■  •  •  <  we  assume  hat  the  quantities 

n! 

(1  <  t  <  m,  <0  =  0)  are  already  computed  and  stored.  Choose  <m  so  that 
the  integral  to  its  right  is  negligible. 

ii)  for  1  <  t  <  m,  we  compute  A.^+i, . . .,  A,n  recursively,  using 


Ai,i+i 


=  A,, 


eu 

(;  + 1) 


and  the  initial  condition  A,ft  from  (i) 

iii)  we  use  Aiy  as  a  numerical  approximation  to  b(j,  9),  ft  <  j  <  n. 

Similar  ideas  are  compatible  with  more  sophisticated  numerical  integration 
methods. 

Returning  to  (9),  we  note  that  the  i/f's  can  be  generated  (by  “inversion”)  as 
geometric  variates  with  parameter  X{Zi)/0,  in  compulation  time  independent 
of  X(Zi)/6;  e.g.,  see  Bratley,  Fox,  and  Schrage  [(1987),  §5.4.5].  Sometimes  null- 
jump  sequences  can  start  only  in  certain  states;  for  example,  in  the  M/M/1 
queue  only  from  the  state  corresponding  to  an  empty  system.  This  condition  is 
detected  automatically  when  X{Zi)  =  9. 

In  certain  applications  (particularly,  those  in  which  inf{A(r)  .  x  E.  S]  ^6), 
the  i/f’s  may  well  account  for  most  of  the  variance  of  £’[/!>'*].  This  suggests 
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that  it  may  pay  to  generate  multiple  V*  sequences  from  a  single  Z  by  inserting 
(say  Jb)  conditionally  independent  copies  of  Uq,  . . j  into  Z.  Generating 
such  a  copy  is  usually  trivial  relative  to  the  work  to  generate  Z,  because  each 
null-jump  sequence  is  generated  in  0(1)  time  (with  a  small  implicit  constant). 
We  then  use 

H{B)-l  k 

r=0  .=1 

to  estimate  a,  where  xj,-  =  ^ji 's  the  i-th  copy  of  i/^.  Our  (over¬ 

all)  estimator  of  a  is  the  average  of  such  iid  estimators  generated  for  a  given 
computer  budget.  Fox  and  Glynn  (1989b)  show  how  to  choose  it  to  maximize 
efficiency  as  a  special  case  of  their  results  on  splitting. 

4.3  Steady-state  sojourn  times 

Consider  estimating  the  expectation  of  a  function  g  of  system  sojourn  time  S 
for  customers  in  steady  state.  When  g  is  linear,  we  use  Little’s  law  as  detailed 
in  Section  3;  in  this  case,  it  does  not  pay  to  uniformize.  When  g  is  nonlinear, 
however,  this  problem  does  not  have  a  form  analogous  to  (1),  even  via  Little’s 
law,  unless  the  state  space  is  expanded  to  keep  track  of  customer  entry  and  exit 
times. 

Nonetheless,  we  can  still  convert  to  discrete  time  (and  not  expand  the  state 
space).  We  note  for  each  customer  the  difference  d  between  the  transition  num¬ 
bers  when  he  leaves  the  system  and  when  he  enters  it.  When  the  chain  is 
uniformized,  clearly  E(5(5)|d]  =  E[<;(c(d))]  where  e  is  an  Erlang  variate  with 
shape  parameter  d.  Without  uniformization,  finding  E[g{S)\Z]  requires  record¬ 
ing  each  customer’s  path  through  the  system,  numerically  computing  the  density 
of  each  customer’s  sojourn  time  (via  a  convolution)  at  many  grid  points,  and 
numerically  computing  the  corresponding  expectations;  this  is  much  harder. 


5  Discrete-time  conversion  for  deterministic¬ 
time  horizons;  theory 

Our  goal  in  this  section  is  to  use  discrete-time  conversion  to  estimate  a  =  E[I], 
where 

1=  /  /(X(t))G(dO-  (10) 

./[O.T] 

We  assume  that  A'  is  a  non-expiosive  continuous-time  Markov  chain  on  state 
space  S,  /  is  a  real-valued  function  defined  on  S,  T  is  deterministic,  and  G 
is  a  (deterministic)  non-decreasing  right-continuous  function  on  [0,oo]).  Such 
estimation  problems  arise  in  a  number  of  different  settings. 


11 


SETTING  4.  Suppose  that  we  interpret  /(*)  as  the  rate  at  which  cost  accrues 
when  the  process  occupies  state  x.  If  G(t)  =  t,  we  find  that  I  is  the  total  cost 
accumulated  over  the  horizon  [0,7^. 

SETTING  5.  Suppose  G{i)  =  —  e~'’‘)(r  >  0)  and  that  /  is  interpreted 

as  in  Setting  4.  Then,  I  is  the  r-discounted  cost  over  the  horizon  [0,  T]. 

SETTING  6.  Suppose  that  the  system  is  charged  an  amount  /(x)  if  the 
process  X  occupies  state  x  at  time  T.  Then,  the  expected  cost  o  charged  to  the 
system  is  given  by  a  =  £[/],  where  G(t)  =  /(i  >  T). 


Based  on  the  discussion  of  Section  3,  we  would  ideally  like  to  obtain  our 
discrete-time  estimator  for  a  by  conditioning  on  Z,  at  least  if  E[I\Z]  is  not 
hard  to  compute  relative  to  alternative  estimators.  The  following  proposition 
will  help  us  compute  our  conditional  expectations. 


PROPOSITION  2.  Let  ^  be  a  <T-field  and  suppose  that 

E  f  |/(X(<))|G(d<)  <  oo. 

Then, 

EUiQ]  =  /  E[fixmmdt). 

JlO.T] 


To  apply  this  proposition  to  the  calculation  of  E{I\Z\,  observe  that 

OO 

/(A'(<))  =  X^/(£„)/{5n-i  <<<£„} 

n=0 


so  that 

£[/(A'(<))|Z]=  f  r  fzAv}dv{fz,*  --*fz..,)(u)du. 

do  Jt-u 

Hence,  by  Proposition  2, 


oo  r  r* 

E[I\Z]  =  Y.  f(Zn)  /  /  exp(-A(Z„)(t  -  u))(/z„ 

n^O  ■'[OTl  Jo 


fz^.Ai^)duG{dt). 


Several  difficulties  with  Iz  =  El/IZ]  are  apparent.  First,  /z  depends  on  the 
entire  history  of  Z  and  therefore  requires  an  infinite  amount  of  time  to  compute 
(exactly).  Second,  the  summands  that  define  Z  involve  difficult  double  integrals 
that  do  not  simplify  significantly,  even  for  well-chosen  G. 
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SETTING  4  (continued).  Here  G(t)  =  t  so  that  the  double  integral  in  (11) 
becomes 


-U 

=  111  I(u  <  t  <  u-i- v)fz,(v)dv(fzo  *  ■  ■  ■  *  fz..t)Mdudt 

Jo  Jo  Jo 

=  111  /(u  <  t  <  u  + v)dt/z.(v)du(/2„  *  •  ■  •  ♦ /z._,)(u)du 

Jo  Jo  Jo 


nT-u 

vfz.(^)dv(fzo  *  ■  ■  ■  *  fz..^)Mdu 


+  f  /  {T-  *i)fzAv)dv{fzo  *■■■*  fz..,  )Mdu 

Jo  Jt-u 

=  [  f  vfz,iv)dv{fza*  -  *  fz..x){^)dtl 

Jo  Jo 

-  I  f  (v-T+  u)fz.{v)dv{fzo  *■■■*  fz..,  )(u)du 
Jo  Jt-u 

=  X{Zn)-^  r{fzo*  -*fz,.Mdu 

Jo 

-A(Z„)-^  /  exp(-A(Z„)(7’- u))(/z„  *  •••*/z..,)(u)du. 

Jo 


(12) 


Hence,  unless  the  convolutions  can  be  rapidly  calculated  (perhaps  as  in  Section 
4.1),  Iz  appears  impractical. 

Similar  difficulties  arise  when  we  wish  to  calculate  Iz  for  the  estimators 
that  arise  in  Settings  5  and  6.  Therefore,  we  now  consider  alternatives.  We  now 
assume  that  X  is  uniformizable  with  A  =  sup{A(ar)  :  r  G  5}  <  oo.  The  analog 
of  (5)  is  given  by 

E[I\Y^]  =  /(Ko*)- /  e-*‘G(dO  +  ^  /(y„*)  /  - - G(dt).  (13) 

Ao.t]  ^  ^1  J[o,T]  "• 


The  inner  (convolution)  integral  that  appears  in  (12)  is  analytically  calculated 
here:  it  becomes  Erlang.  Also,  for  certain  G,  the  conditional  expectation  /y#  = 
E[/iy*]  simplifies  further. 


SETTING  4  (continued).  For  G{t)  =  t,  we  get 


id 


OO 


,r(^ 

k\ 


13 


and  so 


E[I\Y^] 


I  oo  oo 

5  E /(>'')  E  ' 


n=0 


k=n+l 


k\ 


0TmL 

(fc+l)! 


Gross  and  Miller  (1984)  find  (14)  by  a  diflTerent  route. 


(14) 


SETTING  5  (continued).  For  G{t)  =  r“*(l  —  e  ’’*),  we  find  that 


I 


(0  +  r)'*+> 


dt 


n! 


so 


=  fJL^y  Y' 


ifcsn'f  1 


SETTING  6  (continued).  For  G(t)  =  I(i  >  T),  we  obtain 

nssO 

Just  as  in  Section  3,  the  choice  0  =  X  minimizes  both  the  average  work 
E[0{Iy»)]  and  the  variance  var  /y*.  The  arguments  are  identical  to  those  of 
Section  3. 

A  major  difficulty  with  the  estimators  is  that,  as  in  the  case  of  Iz,  they 
depend  on  the  entire  history  of  E*.  One  (obvious)  solution  terminates  the 
simulation  of  after  a  certain  fixed  number  of  transitions  (say  m)  and  deletes 
those  terms  from  the  estimators  which  depend  on  V^f  for  n  >  m.  For  several 
(practically-important)  G’s,  the  resulting  (truncation)  error  can  be  bounded  if 
/  is  bounded  using  bounds  on  Poisson  tails  in  Fox  and  Glynn  (1988).  If  /  is 
not  bounded,  it  seems  hard  or  impossible  to  get  useful  error  bounds.  In  any 
case.  Fox  and  Glynn  (1989c)  show  that  such  termination  strategies  are  also 
undesirable  in  terms  of  their  (theoretical)  large-sample  convergence  rate.  It  is 
shown  there  that  even  if  the  termination  index  is  chosen  to  depend  optimally  on 
the  budget  t,  the  resulting  estimator  will  converge  (slightly)  more  slowly  than 
in  the  budget  t.  This  contrasts  with  the  estimators  discussed  in  Section 
2,  where  Theorem  1  establishes  a  (canonical)  convergence  rate  of  for  the 
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estimator  r(t).  Consequently,  we  now  discuss  variants  of  the  above  estimators 
that  avoid  this  difficulty. 

Suppose  that  we  enrich  the  conditioning  (T-held  so  that  it  also  contains  in¬ 
formation  on  the  number  of  events  of  Nt  completed  by  time  V  >  T.  While 
this  will  clearly  increase  the  variance  of  the  estimator  (see  Proposition  1),  we 
would  expect  that  the  resulting  estimator  will  depend  on  the  history  of  only 
up  through  the  Ne(V)-th  transition.  This  decreases  the  work  required  to  cal¬ 
culate  the  estimator.  In  particular,  the  time  required  to  (exactly)  calculate  the 
estimator  will  now  be  finite,  whereas  the  time  to  calculate  is  infinite. 

Let  Qty  be  the  <r-field  generated  by  V*  and  Nt{V).  Then,  for  t  <  V , 


E[f{Xit))\YrN,(V)  =  m]  =  '£E[f(X(i))I{N,{t)  =  j)\Y»,NeiV)  =  m] 

j=Q 

m 

=  J2/(Y;)P{N»it)=j\Y\N,{V)  =  m} 

j=0 

m 

=  '£nYf)PWt)  =  jmv)  =  m]. 

i=o 

We  used  the  independence  of  Y*  and  Ns  in  the  last  step.  But 

Hence,  in  light  of  Proposition  2,  we  obtain 


(15) 

The  estimator  Iq,  ^  =  E[I\Qsy]  can  be  calculated  in  finite  time.  The  question 
naturally  arises  as  to  which  choice  of  9  and  V  minimizes  the  variance  of  Ict.v 
As  in  Section  3,  the  choice  0  =  A  minimizes  the  variance  for  any  (fixed)  value 
of  V ■  The  next  proposition  states  that  the  variance  of  non-increasing 

in  V. 


PROPOSITION  3.  If  var  /  <  oo,  then  var  E[I\Qsy]  is  non-increasing  in 
V>T. 


This  result  seems  to  suggest  that  we  should  choose  V  as  large  as  possible. 
This,  however,  has  the  effect  of  increasing  the  expected  time  required  to  compute 
Ic,  V-  Assuming  (reasonably)  that  the  average  work  increases  proportionately 
to  V,  this  suggests  choosing  V  to  minimize  V'c(V')  over  V  >T,  where  c(l'')  = 
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var  Ic,,v  Noting  that  the  <r-field  generated  by  Y*  ia  contained  in  we  may 
conclude  that  c(V)  >  var  /y«  for  all  V.  Hence,  Vc(V)  — ►  oo  as  K  — *  oo,  so  that 
we  are  unlikely  to  find  a  minimizing  value  of  V  (much)  larger  than  T.  Since 
the  minimizing  V  is  (probably)  close  to  T,  we  therefore  suggest  setting  V  =  T 
to  avoid  the  (expensive)  trial  runs  necessary  to  find  the  optimal  choice  of  V. 
Henceforth,  we  take  V  =  T  and  =  A. 

Given  this  choice  of  V,  (15)  simplifies  further  when  G  is  appropriately  chosen. 

SETTING  4  (continued).  For  G{t)  =  t,  consider 

The  integrand  is  a  beta  density  (up  to  a  proportionality  constant)  over  [0,  T], 
from  which  we  conclude  that  the  integral  is  T/(m  +  1).  So, 

Afx(T) 

ElI\gx.T]  =  T  52  f(Yj^)/(Nx(T)  +  l). 
i=o 

SETTING  5  (continued).  If  G{t)  =  r~^(l  ~  c""*),  then  Section  6.4  shows 
how  to  compute  the  set  of  required  integrals  with  no  numerical  integration  in 
O(L^)  time,  where  L  is  the  largest  Poisson  variate  generated  over  all  runs. 

SETTING  6  (continued).  Since  X{T)  =  Y^{N\{T)),  it  follows  that  when 
G(t)  =  I(t  >  T),  I  is  Qx^t  measurable  so  that  E[I\Qx^t]  =  /•  Hence,  condition¬ 
ing  on  Qx^t  just  leads  us  back  to  the  naive  estimator  I. 


We  conclude  this  section  with  the  introduction  of  an  estimator  that  replaces 
holding  time  r.v.’s  with  their  means,  yet  can  not  be  represented  as  a  condi¬ 
tional  expectation  of  the  form  E[I\Q].  Nevertheless,  we  classify  the  following 
estimator  as  a  discrete-time  estimator,  because  continuous  holding  time  r.v.’s 
are  integrated  out. 

Our  discussion  is  specific  to  G(t)  =  t.  We  return  to  formula  (12)  and  note 
that  it  equals 

A(Z„)-'(P{5„-i  <  T\Z)  -  P{5„_,  <  7',5„  >  TIZ})  =  A(Z„)-'P{5„  <  r|Z}. 


By  (11),  we  get 


T\Z). 
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Taking  expectations  of  both  sides  of  the  above  equation  gives 


N(T)-1 

E[I]  =  E  g 


n=0 


fjZn) 

A(Z„)’ 


where  N{T)  =  tnin{n  >  0  :  5„  >  7'}.  This  suggests  considering  an  estimator  for 
the  expected  total  reward  over  [0,7^,  based  on  replicates  of  the  r.v.  /j,  where 

s  m- 

We  claim  that  /<f  typically  can’t  be  represented  as  E[1\Q]  for  some  rr-field 
Q.  This  necessarily  implies  that  /j  is  distinct  from  E[l\Q\  r]-  We  prove  this 
claim  by  providing  an  example  in  which  var  Id  >  var  I.  If  Id  were  a  conditional 
expectation  of  7,  this  would  violate  Proposition  1.  To  obtain  the  <an  de,  just 
take  /  =  1  so  that  7  =  T.  Then,  var  7  =  0  but  var  Id  >  0. 

We  include  the  estimator  Id  in  our  paper  because  it  turns  out  that  Id  is 
sometimes  more  efficient  than  7^1 though  it  is  not  a  conditional  expectation. 
To  calculate  Iq^t  requires  generating  Yq,  Y}',.  . ,  Nx{T),  whereas  Id  is 
a  (deterministic)  function  of  Zo,Zi, . . . ,  N(T).  Since  Y^  includes  null 

jumps  not  present  in  Z,  N\(T)  >  N(T).  Hence,  £’[/?( 7^)]  >  E[0{I(;),  t)],  so  that 
Id  always  beats  in  terms  of  work. 

Furthermore,  Id  sometimes  beats  Iff^  T  in  terms  of  variance,  as  our  next 
example  illustrates.  Suppose  5  =  {0, 1}  and  A(0)  =  1,  A(l)  =  0,  so  that  state 
1  is  absorbing.  Clearly,  /(O)  =  1,/(1)  =  0,  and  note  that  Id  =  I{N{T)  > 
0)-  hx.r  =  T/(Ni{T)  +  1),  Then,  var  Id  =  e"'  -  Note  that  as  T  oo. 


Establishing  uniform  integrability  is  easy,  from  which  we  conclude  that  £’(7’/(A^i(7’)+ 
1)  —  1)^  ~  1/T  as  T  — *  oo.  Since  E\Iq^  t]  =  1  —  c"‘,  it  follows  that  var  ~ 

\/T  as  T  — *  oo.  Hence,  var  Id  ■C  var  Iq^  -j-  for  large  T. 


6  Discrete-time  conversion  for  fixed-time  hori¬ 
zons:  implementation 

Whereas  for  random-time  horizons  the  estimator  £"[71^]  is  sometimes  competi¬ 
tive,  for  fixed-time  horizons  E[7|Z]  seems  very  unlikely  to  be  competitive  with 
the  improvements  of  presented  here.  For  the  case  G{t)  =  <,  the  es¬ 

timator  Id  of  Section  5  might  be  competitive  with  these  improvements;  that 
estimator  is  straightforward  to  implement. 
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Our  first  improvement  of  E[I\Qt,T]  stratifies  Nt{T).  Let  6  =  PT.  Section  5 
indicates  that  picking  =  A  is  good,  but  here  we  allow  any  ^  >  A  for  flexibility. 
Stratum  one  is  the  integers  Q,. .  .,Ki,  and  stratum  two  is  the  remaining  positive 
integers.  We  choose  Kt  so  that  P{jV#(T)  <  /fj)  is  near  one.  Next,  we  integrate 
Nt(T)  out  of  stratum  one.  Although  the  second  stratum  has  an  estimator  con¬ 
ditioned  on  Nt{T)  as  well  as  V*,  the  variance  of  the  overall  estimator  relative  to 
that  of  E[I\Qg^T]  is  small.  Our  overall  estimator  q  is  unbiased.  Unlike  E[1\Y^\, 
it  can  be  computed  exactly  with  a  finite  amount  of  work.  Let  a\[t)  average 
iid  copies  of  a  generated  with  computer  budget  t.  Clearly,  Q’i(<)  converges  at 
the  canonical  rate  to  a.  Let  dr2(0  average  iid  copies  of  E[/l(T(y*,  A^j(T))] 
and  assume  (reasonably)  that  the  expected  work  to  generate  a  is  at  most  the 
expected  work  to  generate  E[I\a(Y* ,  Nt{T))].  Neglecting  rounding  in  stratifi¬ 
cation,  var  di(<)  <  var  daCO  “  holds  whenever  we  stratify  proportionally;  e.g., 
see  Bratley,  Fox,  and  Schrage  [(1987),  §2-4]. 

In  Section  6.3,  we  increase  the  efficiency  of  the  first-stratum  estimator  above 
via  splitting.  Though  the  expected  work  per  run  increases,  the  product  of  ex¬ 
pected  work  per  run  and  output  variance  per  run  decreases  -  just  what  we 
want  according  to  Theorem  1.  Call  d3(t)  the  resulting  (unbiased)  overall  es¬ 
timator  of  a.  We  get  var  d3(t)  <  var  di(t).  The  variance  decrease  can  be 
dramatic.  We  already  introduced  a  version  of  splitting  in  Section  4.2.  Here  is 
another  version.  First,  we  partition  stratum  one  into  $  =  {0, 1, . . . , and 
A  =  {K^  +  I, . . . ,  Ki)  where  |$(  |A|  but  P{Ns{T)  €  A)  is  (still)  near  one. 

If  6  is  large,  we  pick  a  few  standard  deviations  to  the  left  of  the  mean  (i.e., 
=  6  —  where  Ci  =  4  say)  and  Kg  a  few  standard  deviations  right 

of  the  mean  (i.e..  Kg  =  S  +  €26^^^  where  C2  =  4  say);  thus,  |<^1  =  0(6)  but 
1A|  =  0(6'^^).  When  the  simulation  reaches  we  split  it  into  (say)  k  sub¬ 
runs  all  starting  at  Yk^  and  going  on  to  Kg .  We  can  nest  the  version  of  splitting 
described  in  Section  4.2  inside  this  procedure. 

Section  6.4  shows  how  to  compute  the  integrals  in  (15)  for  general  G.  When 
G{t)  =  t  or  G{t)  =  I{T  >  <},  then  these  integrals  are  available  in  simple  closed 
form  as  Section  5  shows. 

If  the  chain’s  generator  is  piecewise  constant,  then  an  estimator  of  the  form 
E[I\Qg_T\  applies  to  each  piece,  with  the  final  state  for  piece  1  becoming  the 
initial  state  for  piece  i  -(-  1.  Since  except  for  the  last  piece  we  need  to  know 
the  final  state,  integrating  Nt{T)  out  of  stratum  one  works  only  for  the  last 
piece.  In  each  piece  we  simulate  a  stationary  process,  whereas  to  get  I  one  must 
simulate  a  nonstationary  process.  This  makes  I  a  less  efficient  estimator  relative 
to  E[I\Qe,T]  than  in  the  stationary  case.  Poisson  arrivals  with  a  piecewise- 
constant  intensity  to  a  system  which  is  otherwise  a  stationary  continuous-time 
Markov  chain  produce  a  piecewise-constant  generator. 
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6.1  Telescoping  null- jump  sequences 

Analogously  to  what  was  done  in  Section  4.2,  we  reexpress  Y*  as  {Zo,t/o  — 
—  1, . . .}  and  rewrite  E[I\Y^]  and  E[I\a{Y^,  A^j(T))]  accordingly. 


E[I\Y^]  =  nZo)  f  e-**Gidt)-^-e-^f(Zo)[4-mO,9)+0-'f2nZny„c(n,e) 
c{n,e)  =  / 

J[0. 


where 


0’'+H 


n.-0t 


-G{dt). 


(17) 

(18) 


E[I\o{y\N,{T))\ 

00 

=  <A^«(T)}  (19) 

1  =  0 

+/(Z^)[Ar,(T)  -  x:„]d(m,Ar,(T)) .  <  N,(T)  <  xU,) 

where 

and  Xj  =  Section  4.2  discusses  computation  of  integrals  almost  the 

same  as  c{n,6),  after  an  integration  by  parts.  Section  6.4  shows  how  to  com¬ 
pute  the  d(j,£ys  recursively  for  general  G.  Section  5  shows,  for  example,  that 
d{jj)  =  T/ie+  1)  when  G{t)  =  t. 


6.2  Stratification 

Integrating  the  Poisson  variate  out  of  the  first  stratum  gives 

K, 

=  i]P{Nt{T)  =  i\N0{T)  <  Ki] 

1  =  0 

=  (21) 

1=0  j=0 

=  (22) 

>=o 

where 

q  =  P{N0{T)<  K6]  (23) 
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(24) 


Kt 

ej  =  Y.d(j,e)e-^6‘/ei 

=  ej+i+d{j,j)[e-‘S’/j\],j<Kt.  (25) 

We  compute  the  ej’s  recursively  starting  from  j  =  Kb-  The  (bracketed)  “Pois¬ 
son”  terms  are  computed  recursively  starting  from  j  =  [6J  as  in  Fox  and  Glynn 
(1988).  To  simulate  (22),  use  (19)  with  d{j,  Nt{T))  replaced  by  Cj  and  Ne{T) 
replaced  by  Kb-  This  handles  stratum  one. 

To  handle  stratum  two,  we  force  N*(T)  to  fall  there  and  use  (19)  multiplied 
by  1/(1  —  q).  Devroye  (1986)  gives  an  0(1)  average-time  rejection  algorithm 
to  generate  variates  from  Poisson  right  tails  that,  with  (hypothetical)  infinite- 
precision  computers,  requires  no  right-hand  truncation  of  the  tail;  its  worst- 
case  time  is  unbounded.  Now  we  suggest  an  alternative  which  is  sometimes 
more  attractive.  For  several  important  G’s  and  bounded  /,  the  (close)  upper 
bounds  on  Poisson-tail  masses  in  Fox  and  Glynn  (1988)  let  us  find  a  truncation 
point  Mb  such  that  the  mass  to  its  right  is  negligible.  This  suggests  forcing 
Ng{T)  to  fall  between  Kb  +  I  and  Mb  and  using  “inversion”  to  generate  NsiT) 
thus  conditioned.  Most  of  the  Poisson  probabilities  required  for  “inversion”  are 
already  computed  in  (25),  since  KbIMb  ss  1.  Since  most  of  the  mziss  between 
Kb  +  I  and  Mb  is  concentrated  a.t  Kb  +  I,  straightforward  “inversion”  appears 
competitive  even  for  average  time.  Correspondence  on  this  point  with  Bruce 
Schmeiscr  was  helpful.  Alternatively,  “inversion”  can  be  implemented  with  the 
alias  method;  e.g.,  see  Bratley,  Fox,  and  Schrage  [(1987),  §5.2.8).  This  takes 
0(1)  marginal  time  per  variate  after  a  one-time  0{Mb  —  Kb)  setup.  Clearly 
Mb-Kb  =  0(^i/2). 

With  5  runs  altogether,  proportional  stratification  uses  (22)  on  runs 
and  stratum  two  on  [(1  —  9)SJ  runs.  On  the  remaining  runs,  it  selects  stratum 
one  with  probability  proportional  to  qs—  and  stratum  two  with  probability 
proportional  to  (1  —  q)S  —  [(1  —  ?)•?].  If  an  exact  algorithm  to  generate  from 
Poisson  tails  is  used,  no  bias  results.  Ignoring  rounding,  we  get  lower  variance 
than  by  averaging  iid  copies  of  E[I\<t{Y^ ,  ^9{T)]  -  even  without  the  additional 
variance  reduction  obtained  by  integrating  Ng{T)  out  of  stratum  one.  If  the 
output  variances  and  expected  unit  sampling  costs  for  each  stratum  were  known, 
we  could  get  even  higher  efficiency  by  modifying  the  strata  sampling  allocations 
accordingly;  e.g.,  see  Bratley,  Fox,  and  Schrage  [(1987), §2. 4).  One  could  estimate 
these  quantities  after  every  run  and  adaptive  allocate  subsequent  runs  to  strata 
accordingly;  this  is  a  topic  for  future  research. 

6.3  Splitting 

We  separate  the  sum  (22)  into  two  terms.  In  the  first  term  (say  Vi)  the  sum¬ 
mation  goes  from  0  to  K^.  Call  the  other  term  Vj-  The  weights  ej  in  l^i  are 
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collectively  small  for  many  important  G  such  as  G(t)  =  t.  When  this  occurs, 
the  main  function  of  Vi  is  to  supply  an  initial  state  Yx^+i  for  V^.  It  may  be 
that  V2  is  insensitive  to  V/r^+i.  More  importantly,  it  takes  0(6)  work  to  get 
Yk^+\  but  only  0(6^/^)  to  compute  V2  thereafter. 

We  now  split.  For  each  V\  we  take  (say)  6  copies  of  V2,  conditionally  inde¬ 
pendent  given  V/c,  +  i-  Call  these  replicates  V21,  V'iii  •  •  • ,  Vib-  Thus  (22)  and 

b 

Vb  =  v,-i-b-^'£v2^ 
j=t 

have  the  same  expectation.  Our  overall  estimator  of  that  expectation  averages 
of  iid  copies  of  Vb.  Fox  and  Glynn  (1989b)  show,  in  a  more  general  setting,  how 
to  pick  6  to  maximize  efficiency  in  the  sense  of  Theorem  1. 

Because  stratum  one  gets  almost  all  the  weight,  it  is  probably  not  worth 
extending  splitting  to  stratum  two. 

6.4  Computing  the  d{j,C)  ’5 

Let 

[  (t/Ty(\-t/TYG(dt). 

•'[O.T] 

Since 

;(j,f  +  i)  =  7(j.f)-/(j  +  i,f), 

we  get 

d{he+  l)  =  [(6+ l)/(6+  1 -J)]d(j,  6) -l(j  +  l)/(6+  l-i)]d(j+  1,6-hl)  (26) 
where 

d(£,e)  =  1(6,0) 

1(0,0)  =  G(T-)-G(0-). 

Thus,  given  d(£,  £)  for  £  =  0, 1, . . . ,  L,  we  compute  d(j,  £)  for  j  ^  £  and  0  <  j,£  < 
L  from  (26)  recursively  with  O(L^)  work  and  memory.  So  at  most  L  + 1  numeri¬ 
cal  integrations  are  needed.  Since  the  integrand  for  I(£,  £)  is  just  (</T’)(l  —  i/T) 
times  the  integrand  for  I(£  —  l,f  —  1),  ideas  similar  to  those  in  Section  4.2 
expedite  computation  of  {!(£,  £)  :  £=0,...,L}. 

When  G(dt)  =  e~''^dt,  then 

d(0,0)=(l-e^^)/r 

and  integrating  by  parts  gives  d(£,£)  =  (£/rT)d(£  —  1,£—  1)  —  e”''^/r.  So  no 
numerical  integration  is  needed  for  continuous  discounting. 

If  the  accurate  but  approximate  “inversion”  method  for  generating  variates 
from  Poisson  tails  suggested  in  Section  6.2  is  used,  pick  L  =  Mi.  If  an  exact 
method  is  used,  generate  all  the  Poisson  variates  needed  over  all  runs  at  the 
start;  the  largest  of  these  is  L. 
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APPENDIX 


Proof  of  Theorem  1.  If  (r^(Ri)  =  0,  the  result  is  immediate  since  r(t)  =  o  a.s. 
If  (T^{Ri)  >  0,  then  the  classical  central  limit  theorem  implies  that 

ais  n  oo.  Classical  renewal  theory  shows  that  N{t)/t  — ►  \IE[P{R\)\  a.s.  as 
t  —>■  oo.  Apply  a  random  time  change  theorem  [Billingsley  (1968),  p.  146]  to 
get 

Nityf^rit)  -  o)  cr{R,)N(0,l) 

as  t  — *  oo.  A  converging-together  argument  [e.g.,  see  Billingsley  (1968),  p.  25] 
yields  the  theorem. 

Proof  of  Theorem  2.  For  >  A,  we  shall  find  a  tr-field  Kg  such  that  (t{Y^)  C  Kg 
and 

E[I\lCg]  =  £[7|y*]. 

Then,  Proposition  1  implies  that  var  {E[I\ICg])  >  var  /y»  and  so,  once  we 
construct  a  suitable  Kg,  we  have  proved  that  setting  ^  =  A  minimizes  variance. 

That  construction  begins  with  an  iid  sequence  (r;,  :  »  >  1)  of  Bernoulli 
variates,  independent  of  V*  and  Ng,  each  with  P{^i  =  1}  =  X/0.  We  use  it  to 
thin  null  jumps  of  Y*  to  obtain  Y*.  The  i-th  null  jump  is  retained  if  and  only 
if  r)i  =  1. 

Now,  we  set  Kg  —  <t{Y^ . . .).  Since  /  is  a  (measurable)  function  of 
Y«  and  Ng,  we  get 

E[I\Y\vum,--]  =  E[I\y'] 

using  (3)  on  p.  308  of  Chung  (1974),  with  Ti  =  <T{Y^,Ng),  T2  =  a{Y^), 
£3  =  .  ■ .),  completing  the  proof. 


Proof  of  Proposition  2.  The  right-hand  side,  namely  j-^E[f{X{t))\Q]G{dt), 
is  a  ^-measurable  r.v.  Also,  if  A  €  ^,  then  Fubini  implies  that 


,T] 


E[fixmg]G{dt)P{dw) 


I  I  E[f{XmG]P(dw)Gidt) 

JiO.T]  Ja 

I  f  fiX(t))P{dw)G(dt) 

J[o,T)  Ja 

f  (  f{X{t))G{dt)P{dw). 

Ja  -/[o.t] 


Using  the  defining  relation  for  conditional  expectations,  we  see  that  the  propo¬ 
sition  is  proved. 
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Proof  of  Proposition  S.  Suppose  that  <2  >  >  T.  Note  that  Ne(t2)  — 

is  independent  of  Qt  ti  and  ff{Ng{s)  ;  s  <  ti).  Hence,  we  get 

E[I\Q».tA  =  E[J\QeM,Ne(t2)  -  Ns(U)] 

from  (3)  on  p.  308  of  Chung  (1974),  with  jTj  =  :  s  <  ti)  V 

T2  =  Qe.ti,  E3  =  o{Ns{t2)  —  Ns(ti)),  noting  that  I  is  P’1  measurable.  But 
Ge.t,  <  Gs.ii  V  (T{Na{t2)  -  Naiti)),  so  var  E[I\Qajj]  <  var  ,  A^«(<2)  - 

Na(ti)]  =  var  E[I\Ga,tx\,  by  Proposition  1. 
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(and  possibly  other  information),  we  reduce  variance.  This  is  discrete-time  con¬ 
version.  We  further  increase  efficiency  by  combining  discrete-time  conversion 
with  stratification  and  splitting. 
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